Download paginated API data to a CSV

Python
Author
Affiliation

Sandy Rogers

MGnify team at EMBL-EBI

This is a static preview

You can run and edit these examples interactively on Galaxy

Fetch paginated data from the MGnify API, and save it as a CSV file

The MGnify API returns paginated data. When you list data, it comes to you in pages, or chunks. You have to request each page in turn. The jsonapi_client package can do this for you, automatically.

This example shows you how to download a paginated list of data and save it to a CSV table file

You can find all of the other “API endpoints” using the Browsable API interface in your web browser. The URL you see in the browsable API is exactly the same as the one you can use in this code.

This is an interactive code notebook (a Jupyter Notebook). To run this code, click into each cell and press the ▶ button in the top toolbar, or press shift+enter.


We pick an API endpoint for the kind of data to download:

from lib.variable_utils import get_variable_from_link_or_input

# You can also just directly set the api_endpoint variable in code, like this:
# api_endpoint = 'super-studies'

api_endpoint = get_variable_from_link_or_input('API_ENDPOINT', 'API Endpoint', 'super-studies')

Using API Endpoint super-studies from the link you followed.

Using "super-studies" as API Endpoint

Use jsonapi_client to go through the paginated data. Note that this may take quite a long for long lists, because the API automatically slows down your connection if you request a lot of data. This keeps the service working well for everybody else.

We use pandas, an excellent library for data analysis, to normalise the data into a table.

from jsonapi_client import Session
import pandas as pd

with Session("https://www.ebi.ac.uk/metagenomics/api/v1") as mgnify:
    resources = map(lambda r: r.json, mgnify.iterate(api_endpoint))
    resources = pd.json_normalize(resources)
    resources.to_csv(f"{api_endpoint}.csv")
resources
type id attributes.super-study-id attributes.title attributes.url-slug attributes.description attributes.image-url attributes.biomes-count
0 super-studies 1 1 Tara Oceans tara-oceans The Tara Oceans expedition (Karsenti et al. 20... data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA... 0
1 super-studies 2 2 Earth Microbiome Project earth-microbiome-project The Earth Microbiome Project is now available ... data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA... 0
2 super-studies 3 3 NASA GeneLab Microbiome (MANGO) nasa-genelab-microbiome-mango Project MANGO provides access to the microbiom... data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA... 0
3 super-studies 4 4 HoloFood holofood Holistic approach to improve the efficiency of... data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA... 2
4 super-studies 5 5 Malaspina malaspina The Malaspina circumnavigation expedition was ... data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA... 0
5 super-studies 6 6 AtlantECO atlanteco The EU-funded AtlantECO project aims to develo... data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA... 0
6 super-studies 7 7 FindingPheno findingpheno FindingPheno is creating an integrated computa... data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA... 7
7 super-studies 8 8 National Mouse Genetics Network (NMGN) Microbi... nmgn-microbiome The Microbiome Cluster of the National Mouse G... data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA... 0